-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Escape control characters in JSON output #20089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape control characters in JSON output #20089
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @erickt (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see CONTRIBUTING.md for more information. |
b'\n' => "\\n".into_cow(), | ||
b'\r' => "\\r".into_cow(), | ||
b'\t' => "\\t".into_cow(), | ||
b'\x00'...b'\x1f' | b'\x7f' => format!("\\u00{:0>2x}", *byte).into_cow(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Unfortunately this unfortunately would be pretty expensive. Could you rewrite this to not use an allocation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I could make the match exhaustive with static strings, but it would be pretty long (32 lines). Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be. You could try something like:
for (i, byte) in bytes.iter().enumerate() {
let escaped = match *byte {
b'"' => "\\\"",
b'\\' => "\\\\",
b'\x08' => "\\b",
b'\x0c' => "\\f",
b'\n' => "\\n",
b'\r' => "\\r",
b'\t' => "\\t",
b'\x00'...b'\x1f' | b'\x7f' => "\\u00",
_ => { continue; }
};
if start < i {
try!(wr.write(bytes[start..i]));
}
try!(wr.write_str(escaped));
match *byte {
b'\x00'...b'\x1f' | b'\x7f' => try!(write!(wr, "{:0>2x}", *byte)),
_ => {}
}
start = i + 1;
}
Not perfect, but it at least doesn't force an allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that looks good. Problem though is that some control characters are already in the list so we'd need a pattern like b'\x00'...b'\x07' | b'\x0b' | b'\x0e'...b'\x1f' | b'\x7f'
. Perhaps that's getting a little complex. I've now updated it with static strings only – let me know which one you prefer.
Looks good! Can you add yourself to the |
Ok, done. Thanks! |
Needs a rebase |
Rebased |
…cape Conflicts: src/libserialize/json.rs
The JSON spec (http://www.json.org) says that control characters are not allowed in JSON, but Rust currently does not escape them. This PR escapes ASCII control characters in JSON output.